40 research outputs found
Sustainable Software Ecosystems: Software Engineers, Domain Scientists, and Engineers Collaborating for Science
The development of scientific software is often a partnership between domain
scientists and scientific software engineers. It is especially important to
embrace these collaborations when developing advanced scientific software,
where sustainability, reproducibility, and extensibility are important. In the
ideal case, as discussed in this manuscript, this brings together teams
composed of the world's foremost scientific experts in a given field with
seasoned software developers experienced in forming highly collaborative teams
working on software to further scientific research.Comment: 4 pages, submission for WSSSPE
Sustainable Software Ecosystems for Open Science
Sustainable software ecosystems are difficult to build, and require concerted
effort, community norms and collaborations. In science it is especially
important to establish communities in which faculty, staff, students and
open-source professionals work together and treat software as a first-class
product of scientific investigation-just as mathematics is treated in the
physical sciences. Kitware has a rich history of establishing collaborative
projects in the science, engineering and medical research fields, and continues
to work on improving that model as new technologies and approaches become
available. This approach closely follows and is enhanced by the movement
towards practicing open, reproducible research in the sciences where data,
source code, methodology and approach are all available so that complex
experiments can be independently reproduced and verified.Comment: Workshop on Sustainable Software: Practices and Experiences, 4 pages,
3 figure
Building Near-Real-Time Processing Pipelines with the Spark-MPI Platform
Advances in detectors and computational technologies provide new
opportunities for applied research and the fundamental sciences. Concurrently,
dramatic increases in the three Vs (Volume, Velocity, and Variety) of
experimental data and the scale of computational tasks produced the demand for
new real-time processing systems at experimental facilities. Recently, this
demand was addressed by the Spark-MPI approach connecting the Spark
data-intensive platform with the MPI high-performance framework. In contrast
with existing data management and analytics systems, Spark introduced a new
middleware based on resilient distributed datasets (RDDs), which decoupled
various data sources from high-level processing algorithms. The RDD middleware
significantly advanced the scope of data-intensive applications, spreading from
SQL queries to machine learning to graph processing. Spark-MPI further extended
the Spark ecosystem with the MPI applications using the Process Management
Interface. The paper explores this integrated platform within the context of
online ptychographic and tomographic reconstruction pipelines.Comment: New York Scientific Data Summit, August 6-9, 201
Summary of the First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE1)
Challenges related to development, deployment, and maintenance of reusable software for science are becoming a growing concern. Many scientists’ research increasingly depends on the quality and availability of software upon which their works are built. To highlight some of these issues and share experiences, the First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE1) was held in November 2013 in conjunction with the SC13 Conference. The workshop featured keynote presentations and a large number (54) of solicited extended abstracts that were grouped into three themes and presented via panels. A set of collaborative notes of the presentations and discussion was taken during the workshop.
Unique perspectives were captured about issues such as comprehensive documentation, development and deployment practices, software licenses and career paths for developers. Attribution systems that account for evidence of software contribution and impact were also discussed. These include mechanisms such as Digital Object Identifiers, publication of “software papers”, and the use of online systems, for example source code repositories like GitHub. This paper summarizes the issues and shared experiences that were discussed, including cross-cutting issues and use cases. It joins a nascent literature seeking to understand what drives software work in science, and how it is impacted by the reward systems of science. These incentives can determine the extent to which developers are motivated to build software for the long-term, for the use of others, and whether to work collaboratively or separately. It also explores community building, leadership, and dynamics in relation to successful scientific software
The Quixote project: Collaborative and Open Quantum Chemistry data management in the Internet age.
Computational Quantum Chemistry has developed into a powerful, efficient, reliable and increasingly routine tool for exploring the structure and properties of small to medium sized molecules. Many thousands of calculations are performed every day, some offering results which approach experimental accuracy. However, in contrast to other disciplines, such as crystallography, or bioinformatics, where standard formats and well-known, unified databases exist, this QC data is generally destined to remain locally held in files which are not designed to be machine-readable. Only a very small subset of these results will become accessible to the wider community through publication.In this paper we describe how the Quixote Project is developing the infrastructure required to convert output from a number of different molecular quantum chemistry packages to a common semantically rich, machine-readable format and to build respositories of QC results. Such an infrastructure offers benefits at many levels. The standardised representation of the results will facilitate software interoperability, for example making it easier for analysis tools to take data from different QC packages, and will also help with archival and deposition of results. The repository infrastructure, which is lightweight and built using Open software components, can be implemented at individual researcher, project, organisation or community level, offering the exciting possibility that in future many of these QC results can be made publically available, to be searched and interpreted just as crystallography and bioinformatics results are today.Although we believe that quantum chemists will appreciate the contribution the Quixote infrastructure can make to the organisation and and exchange of their results, we anticipate that greater rewards will come from enabling their results to be consumed by a wider community. As the respositories grow they will become a valuable source of chemical data for use by other disciplines in both research and education.The Quixote project is unconventional in that the infrastructure is being implemented in advance of a full definition of the data model which will eventually underpin it. We believe that a working system which offers real value to researchers based on tools and shared, searchable repositories will encourage early participation from a broader community, including both producers and consumers of data. In the early stages, searching and indexing can be performed on the chemical subject of the calculations, and well defined calculation meta-data. The process of defining more specific quantum chemical definitions, adding them to dictionaries and extracting them consistently from the results of the various software packages can then proceed in an incremental manner, adding additional value at each stage.Not only will these results help to change the data management model in the field of Quantum Chemistry, but the methodology can be applied to other pressing problems related to data in computational and experimental science.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are
Real-time 3D analysis during electron tomography using tomviz
The demand for high-throughput electron tomography is rapidly increasing in
biological and material sciences. However, this 3D imaging technique is
computationally bottlenecked by alignment and reconstruction which runs from
hours to days. We demonstrate real-time tomography with dynamic 3D tomographic
visualization to enable rapid interpretation of specimen structure immediately
as data is collected on an electron microscope. Using geometrically complex
chiral nanoparticles, we show volumetric interpretation can begin in less than
10 minutes and a high quality tomogram is available within 30 minutes. Real
time tomography is integrated into tomviz, an open source and cross platform 3D
analysis tool that contains intuitive graphical user interfaces (GUI) to enable
any scientist to characterize biological and material structure in 3D
Self-driving Multimodal Studies at User Facilities
Multimodal characterization is commonly required for understanding materials.
User facilities possess the infrastructure to perform these measurements,
albeit in serial over days to months. In this paper, we describe a unified
multimodal measurement of a single sample library at distant instruments,
driven by a concert of distributed agents that use analysis from each modality
to inform the direction of the other in real time. Powered by the Bluesky
project at the National Synchrotron Light Source II, this experiment is a
world's first for beamline science, and provides a blueprint for future
approaches to multimodal and multifidelity experiments at user facilities.Comment: 36th Conference on Neural Information Processing Systems (NeurIPS
2022). AI4Mat Worksho
Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk five years on
RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.Abstract Background The Blue Obelisk movement was established in 2005 as a response to the lack of Open Data, Open Standards and Open Source (ODOSOS) in chemistry. It aims to make it easier to carry out chemistry research by promoting interoperability between chemistry software, encouraging cooperation between Open Source developers, and developing community resources and Open Standards. Results This contribution looks back on the work carried out by the Blue Obelisk in the past 5 years and surveys progress and remaining challenges in the areas of Open Data, Open Standards, and Open Source in chemistry. Conclusions We show that the Blue Obelisk has been very successful in bringing together researchers and developers with common interests in ODOSOS, leading to development of many useful resources freely available to the chemistry community.Peer Reviewe
The physical and structural properties of thiol encapsulated gold nanoparticle Langmuir-Schaeffer films
EThOS - Electronic Theses Online ServiceGBUnited Kingdo